There is loads of good material online about visualising data. This lecture collects together material from my favourite sources.
There is loads of good material online about visualising data. This lecture collects together material from my favourite sources.
The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities measured
The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data
(Tufte)
Pie charts: not a good idea
Pie charts: not a good idea
Neither are stacked and group bar charts
Neither are stacked and group bar charts
Worse still…
Use points not lines if element order is not relevant.
Use points not lines if element order is not relevant.
What’s wrong here?
Show data variation, not design variation
Graphics should not quote data out of context
Graphics should not quote data out of context
Graphics should not quote data out of context
Convey groups clearly – colour, fill, facet
data("ChickWeight")
ChickWeight <- ChickWeight %>% mutate(Week = factor(1 + round(Time/7)))
head(ChickWeight)
## weight Time Chick Diet Week ## 1 42 0 1 1 1 ## 2 51 2 1 1 1 ## 3 59 4 1 1 2 ## 4 64 6 1 1 2 ## 5 76 8 1 1 2 ## 6 93 10 1 1 2
Using fill to show weight distributions for each diet
Using fill + colour to show weight distributions for each diet
Weight distributions for each week
Weight distributions for combinations of diet (fill) and week (colour)
Weight distributions for combinations of diet and week (with interaction)
Use fill for one grouping variable and facetting for the other
Try both ways – often gives interesting (different) perspectives
Try both ways – often gives interesting (different) perspectives
Avoid cross-hatching or other patterns that distract the mind from the information being presented
Axes should include or nearly include the range of data, with data filling up the plot
Axes should include or nearly include the range of data, with data filling up the plot
## Warning: Removed 101 rows containing missing values (geom_point).
Don’t insist that zero always be included
Don’t insist that zero always be included
Consider a log scale when data is over different scales or more important to understand % change
Consider a log scale when data is over different scales or more important to understand % change
Consider a log scale when data is over different scales or more important to understand % change
Don’t forget to specify units and label axes. Tick intervals should ideally be at nice round numbers.
Don’t forget to specify units and label axes. Tick intervals should ideally be at nice round numbers.
Don’t forget to specify units and label axes. Tick intervals should ideally be at nice round numbers.
Don’t forget to specify units and label axes. Tick intervals should ideally be at nice round numbers.
Don’t forget to specify units and label axes. Tick intervals should ideally be at nice round numbers.
Use same scales when graphs are compared
Use same scales when graphs are compared
Use same scales when graphs are compared
Think about whether to compare vertically or horizontally
Think about whether to compare vertically or horizontally
Easy with ggplot option facet_grid(. ~ suburb)
or vertical with facet_grid(suburb ~ .)
Can be suggested by data e.g. spatial, but otherwise try for ~3:2 aspect ratio
Can be suggested by data e.g. spatial, but otherwise try for ~3:2 aspect ratio
Can be suggested by data e.g. spatial, but otherwise try for ~3:2 aspect ratio
Prepare graphics in the final aspect ratio to be used. Never “copy-and-stretch”!
Prepare graphics in the final aspect ratio to be used. Never “copy-and-stretch”!
Avoid dark shaded backgrounds
Avoid dark, dominating grid lines
Check that any very thin lines don’t disappear on resizing/printing
theme_bw() is a good default option
Avoid cluttered legends
Where possible, add labels directly to the elements of the plot rather than use a legend at all.
Use ggrepel package to avoid overlap between labels
If this won’t work, then keep the legend from obscuring the plotted data, and make it small and neat
Legend inside plot margins or outside? Data trumps legend. If blank regions near one or more corners, then inside. If not (or would obscure data) then outside
Write out explanations of the data on the graphic itself. Label important events in the data.
Avoid overlap as much as possible
Plots should be self-explanatory, so captions should be detailed.
Proofread carefully that any text (including the caption) doesn’t contradict what’s in the figure (integrated reporting approaches like R Markdown can help with this)